Load-balanced and locality-aware scheduling for data-intensive workloads at extreme scales
نویسندگان
چکیده
Data driven programming models such as many-task computing (MTC) have been prevalent for running data-intensive scientific applications. MTC applies over-decomposition to enable distributed scheduling. To achieve extreme scalability, MTC proposes a fully distributed task scheduling architecture that employs as many schedulers as the compute nodes to make scheduling decisions. Achieving distributed load balancing and best exploiting data-locality are two important goals for the best performance of distributed scheduling of data-intensive applications. Our previous research proposed a data-aware work stealing technique to optimize both load balancing and data-locality by using both dedicated and shared task ready queues in each scheduler. Tasks were organized in queues based on the input data size and location. Distributed key-value store was applied to manage task metadata. We implemented the technique in MATRIX, a distributed MTC task execution framework. In this work, we devise an analytical sub-optimal upper bound of the proposed technique; compare MATRIX with other scheduling systems; and explore the scalability of the technique at extreme scales. Results show that the technique is not only scalable, but can achieve performance within 15% of the sub-optimal solution. Copyright © 2015 John Wiley & Sons, Ltd.
منابع مشابه
Accelerating Large Scale Scientific Exploration through Data Diffusion
Scientific and data-intensive applications often require exploratory analysis on large datasets, which is often carried out on large scale distributed resources where data locality is crucial to achieve high system throughput and performance. We propose a “data diffusion” approach that acquires resources for data analysis dynamically, schedules computations as close to data as possible, and rep...
متن کاملA Data Locality Aware Online Scheduling Approach for I/O-Intensive Jobs with File Sharing
Many scientific investigations have to deal with large amounts of data from simulations and experiments. Data analysis in such investigations typically involves extraction of subsets of data, followed by computations performed on extracted data. Scheduling in this context requires efficient utilization of the computational, storage and network resources to optimize response time. The data-inten...
متن کاملData Diffusion: Dynamic Resource Provision and Data-Aware Scheduling for Data Intensive Applications
Data intensive applications often involve the analysis of large datasets that require large amounts of compute and storage resources. While dedicated compute and/or storage farms offer good task/data throughput, they suffer low resource utilization problem under varying workloads conditions. If we instead move such data to distributed computing resources, then we incur expensive data transfer c...
متن کاملA Load Balancing Strategy for Iterated Parallel Loop Scheduling
An eecient template for the implementation on distributed-memory multiprocessors of iterated parallel loops, i.e. parallel loops nested in a sequential loop, is presented. The template is explicitly designed to smooth unbalanced processor workloads deriving from loops whose iterations are characterized by highly varying execution times. Experiments conducted shows performance gains w.r.t. HPF-l...
متن کاملAchieving Data-Aware Load Balancing through Distributed Queues and Key/Value Stores
Load balancing techniques (e.g. work stealing) are important to obtain the best performance for distributed task scheduling system. In work stealing, tasks are randomly migrated from heavy-loaded schedulers to idle ones. However, for data-intensive applications where tasks are dependent and task execution involves processing large amount of data, migrating tasks blindly would compromise the dat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Concurrency and Computation: Practice and Experience
دوره 28 شماره
صفحات -
تاریخ انتشار 2016